{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bfrh3DUze0QN"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sx-jnufYfcJG"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "s1bQihY6-Y4N"
      },
      "source": [
        "# Pandas DataFrame to Fairness Indicators Case Study\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XHTjeiUMeolM"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/fairness-indicators/tree/master/g3doc/tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/fairness-indicators/g3doc/tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ay80altXzvgZ"
      },
      "source": [
        "## Case Study Overview\n",
        "In this case study we will apply [TensorFlow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started) and [Fairness Indicators](https://www.tensorflow.org/tfx/guide/fairness_indicators) to evaluate data stored as a Pandas DataFrame, where each row contains ground truth labels, various features, and a model prediction. We will show how this workflow can be used to spot potential fairness concerns, independent of the framework one used to construct and train the model. As in this case study, we can analyze the results from any machine learning framework (e.g. TensorFlow, JAX, etc) once they are converted to a Pandas DataFrame.\n",
        " \n",
        "For this exercise, we will leverage the Deep Neural Network (DNN) model that was developed in the [Shape Constraints for Ethics with Tensorflow Lattice](https://colab.research.google.com/github/tensorflow/lattice/blob/master/docs/tutorials/shape_constraints_for_ethics.ipynb#scrollTo=uc0VwsT5nvQi) case study using the Law School Admissions dataset from the Law School Admissions Council (LSAC). This classifier attempts to predict whether or not a student will pass the bar, based on their  Law School Admission Test (LSAT) score and undergraduate GPA. This classifier attempts to predict whether or not a student will pass the bar, based on their LSAT score and undergraduate GPA.\n",
        "\n",
        "## LSAC Dataset\n",
        "The dataset used within this case study was originally collected for a study called '[LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series](https://eric.ed.gov/?id=ED469370)' by Linda Wightman in 1998. The dataset is currently hosted [here](http://www.seaphe.org/databases.php).\n",
        "\n",
        "*   **dnn_bar_pass_prediction**: The LSAT prediction from the DNN model.\n",
        "*   **gender**: Gender of the student.\n",
        "*   **lsat**: LSAT score received by the student.\n",
        "*   **pass_bar**: Ground truth label indicating whether or not the student eventually passed the bar.\n",
        "*   **race**: Race of the student.\n",
        "*   **ugpa**: A student's undergraduate GPA.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ob01ASKqixfw"
      },
      "outputs": [],
      "source": [
        "!pip install -q -U pip==20.2\n",
        "\n",
        "!pip install -q -U \\\n",
        "  tensorflow-model-analysis==0.36.0 \\\n",
        "  tensorflow-data-validation==1.5.0 \\\n",
        "  tfx-bsl==1.5.0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tnxSvgkaSEIj"
      },
      "source": [
        "## Importing required packages:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0q8cTfpTkEMP"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import tempfile\n",
        "import pandas as pd\n",
        "import six.moves.urllib as urllib\n",
        "import pprint\n",
        "\n",
        "import tensorflow_model_analysis as tfma\n",
        "from google.protobuf import text_format\n",
        "\n",
        "import tensorflow as tf\n",
        "tf.compat.v1.enable_v2_behavior()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b8kWW3t4-eS1"
      },
      "source": [
        "## Download the data and explore the initial dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wMZJtgj0qJ0x"
      },
      "outputs": [],
      "source": [
        "# Download the LSAT dataset and setup the required filepaths.\n",
        "_DATA_ROOT = tempfile.mkdtemp(prefix='lsat-data')\n",
        "_DATA_PATH = 'https://storage.googleapis.com/lawschool_dataset/bar_pass_prediction.csv'\n",
        "_DATA_FILEPATH = os.path.join(_DATA_ROOT, 'bar_pass_prediction.csv')\n",
        "\n",
        "data = urllib.request.urlopen(_DATA_PATH)\n",
        "\n",
        "_LSAT_DF = pd.read_csv(data)\n",
        "\n",
        "# To simpliy the case study, we will only use the columns that will be used for\n",
        "# our model.\n",
        "_COLUMN_NAMES = [\n",
        "  'dnn_bar_pass_prediction',\n",
        "  'gender',\n",
        "  'lsat',\n",
        "  'pass_bar',\n",
        "  'race1',\n",
        "  'ugpa',\n",
        "]\n",
        "\n",
        "_LSAT_DF.dropna()\n",
        "_LSAT_DF['gender'] = _LSAT_DF['gender'].astype(str)\n",
        "_LSAT_DF['race1'] = _LSAT_DF['race1'].astype(str)\n",
        "_LSAT_DF = _LSAT_DF[_COLUMN_NAMES]\n",
        "\n",
        "_LSAT_DF.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GyeVg2s7-wlB"
      },
      "source": [
        "## Configure Fairness Indicators.\n",
        "There are several parameters that you’ll need to take into account when using Fairness Indicators with a DataFrame \n",
        "\n",
        "*   Your input DataFrame must contain a prediction column and label column from your model. By default Fairness Indicators will look for a prediction column called `prediction` and a label column called `label` within your DataFrame.\n",
        "   *   If either of these values are not found a KeyError will be raised.\n",
        "\n",
        "*   In addition to a DataFrame, you’ll also need to include an `eval_config` that should include the metrics to compute, slices to compute the metrics on, and the column names for example labels and predictions. \n",
        "   *   `metrics_specs` will set the metrics to compute. The `FairnessIndicators` metric will be required to render the fairness metrics and you can see a list of additional optional metrics [here](https://www.tensorflow.org/tfx/model_analysis/metrics).\n",
        "\n",
        "   *   `slicing_specs` is an optional slicing parameter to specify what feature you’re interested in investigating. Within this case study race1 is used, however you can also set this value to another feature (for example gender in the context of this DataFrame). If `slicing_specs` is not provided all features will be included.\n",
        "   *   If your DataFrame includes a label or prediction column that is different from the default `prediction` or `label`, you can configure the `label_key` and `prediction_key` to a new value.\n",
        "\n",
        "*   If `output_path` is not specified a temporary directory will be created."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "53caFasB5V9p"
      },
      "outputs": [],
      "source": [
        "# Specify Fairness Indicators in eval_config.\n",
        "eval_config = text_format.Parse(\"\"\"\n",
        "  model_specs {\n",
        "    prediction_key: 'dnn_bar_pass_prediction',\n",
        "    label_key: 'pass_bar'\n",
        "  }\n",
        "  metrics_specs {\n",
        "    metrics {class_name: \"AUC\"}\n",
        "    metrics {\n",
        "      class_name: \"FairnessIndicators\"\n",
        "      config: '{\"thresholds\": [0.50, 0.90]}'\n",
        "    }\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: 'race1'\n",
        "  }\n",
        "  slicing_specs {}\n",
        "  \"\"\", tfma.EvalConfig())\n",
        "\n",
        "# Run TensorFlow Model Analysis.\n",
        "eval_result = tfma.analyze_raw_data(\n",
        "  data=_LSAT_DF,\n",
        "  eval_config=eval_config,\n",
        "  output_path=_DATA_ROOT)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KD96mw0e--DE"
      },
      "source": [
        "## Explore model performance with Fairness Indicators.\n",
        "\n",
        "After running Fairness Indicators, we can visualize different metrics that we selected to analyze our models performance. Within this case study we’ve included Fairness Indicators and arbitrarily picked AUC.\n",
        "\n",
        "When we first look at the overall AUC for each race slice we can see a slight discrepancy in model performance, but nothing that is arguably alarming.\n",
        "\n",
        "*   **Asian**: 0.58\n",
        "*   **Black**: 0.58\n",
        "*   **Hispanic**: 0.58\n",
        "*   **Other**: 0.64\n",
        "*   **White**: 0.6\n",
        "\n",
        "However, when we look at the false negative rates split by race, our model again incorrectly predicts the likelihood of a user passing the bar at different rates and, this time, does so by a lot. \n",
        "\n",
        "*   **Asian**: 0.01\n",
        "*   **Black**: 0.05\n",
        "*   **Hispanic**: 0.02\n",
        "*   **Other**: 0.01\n",
        "*   **White**: 0.01\n",
        "\n",
        "Most notably the difference between Black and White students is about 380%, meaning that our model is nearly 4x more likely to incorrectly predict that a black student will not pass the bar, than a whilte student. If we were to continue with this effort, a practitioner could use these results as a signal that they should spend more time ensuring that their model works well for people from all backgrounds."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NIdchYPb-_ZV"
      },
      "outputs": [],
      "source": [
        "# Render Fairness Indicators.\n",
        "tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_result)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NprhBTCbY1sF"
      },
      "source": [
        "# tfma.EvalResult"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6f92-e98Y40r"
      },
      "source": [
        "The [`eval_result`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult) object, rendered above in `render_fairness_indicator()`, has its own API that can be used to read TFMA results into your programs."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CDDUxdx-Y8e0"
      },
      "source": [
        "## [`get_slice_names()`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult#get_slice_names) and [`get_metric_names()`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult#get_metric_names)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oG_mNUNbY98t"
      },
      "source": [
        "To get the evaluated slices and metrics, you can use the respective functions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kbA1sXhCY_G7"
      },
      "outputs": [],
      "source": [
        "pp = pprint.PrettyPrinter()\n",
        "\n",
        "print(\"Slices:\")\n",
        "pp.pprint(eval_result.get_slice_names())\n",
        "print(\"\\nMetrics:\")\n",
        "pp.pprint(eval_result.get_metric_names())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rA1M8aBmZAk6"
      },
      "source": [
        "## [`get_metrics_for_slice()`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult#get_metrics_for_slice) and [`get_metrics_for_all_slices()`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult#get_metrics_for_all_slices)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a3Ath5MsZCRX"
      },
      "source": [
        "If you want to get the metrics for a particular slice, you can use `get_metrics_for_slice()`. It returns a dictionary mapping metric names to [metric values](https://github.com/tensorflow/model-analysis/blob/cdb6790dcd7a37c82afb493859b3ef4898963fee/tensorflow_model_analysis/proto/metrics_for_slice.proto#L194)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9BWg5HoyZDh-"
      },
      "outputs": [],
      "source": [
        "baseline_slice = ()\n",
        "black_slice = (('race1', 'black'),)\n",
        "\n",
        "print(\"Baseline metric values:\")\n",
        "pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))\n",
        "print(\"Black metric values:\")\n",
        "pp.pprint(eval_result.get_metrics_for_slice(black_slice))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bDcOxvqBZEfg"
      },
      "source": [
        "If you want to get the metrics for all slices, `get_metrics_for_all_slices()` returns a dictionary mapping each slice to the corresponding `get_metrics_for_slices(slice)`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "p4NQCi52ZFrw"
      },
      "outputs": [],
      "source": [
        "pp.pprint(eval_result.get_metrics_for_all_slices())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y-nbqnSTkmW3"
      },
      "source": [
        "## Conclusion\n",
        "Within this case study we imported a dataset into a Pandas DataFrame that we then analyzed with Fairness Indicators. Understanding the results of your model and underlying data is an important step in ensuring your model doesn't reflect harmful bias. In the context of this case study we examined the the LSAC dataset and how predictions from this data could be impacted by a students race. The concept of “what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning.”\u003csup\u003e1\u003c/sup\u003e Fairness Indicator is a tool to help mitigate fairness concerns in your machine learning model.\n",
        "\n",
        "For more information on using Fairness Indicators and resources to learn more about fairness concerns see [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/guide).\n",
        "\n",
        "---\n",
        "\n",
        "1. Hutchinson, B., Mitchell, M. (2018). 50 Years of Test (Un)fairness: Lessons for Machine Learning. https://arxiv.org/abs/1811.10104\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "REV1rBnoBAo1"
      },
      "source": [
        "## Appendix\n",
        "\n",
        "Below are a few functions to help convert ML models to Pandas DataFrame.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "F4qv9GXiBsFA"
      },
      "outputs": [],
      "source": [
        "# TensorFlow Estimator to Pandas DataFrame:\n",
        "\n",
        "# _X_VALUE =  # X value of binary estimator.\n",
        "# _Y_VALUE =  # Y value of binary estimator.\n",
        "# _GROUND_TRUTH_LABEL =  # Ground truth value of binary estimator.\n",
        "\n",
        "def _get_predicted_probabilities(estimator, input_df, get_input_fn):\n",
        "  predictions = estimator.predict(\n",
        "      input_fn=get_input_fn(input_df=input_df, num_epochs=1))\n",
        "  return [prediction['probabilities'][1] for prediction in predictions]\n",
        "\n",
        "def _get_input_fn_law(input_df, num_epochs, batch_size=None):\n",
        "  return tf.compat.v1.estimator.inputs.pandas_input_fn(\n",
        "      x=input_df[[_X_VALUE, _Y_VALUE]],\n",
        "      y=input_df[_GROUND_TRUTH_LABEL],\n",
        "      num_epochs=num_epochs,\n",
        "      batch_size=batch_size or len(input_df),\n",
        "      shuffle=False)\n",
        "\n",
        "def estimator_to_dataframe(estimator, input_df, num_keypoints=20):\n",
        "  x = np.linspace(min(input_df[_X_VALUE]), max(input_df[_X_VALUE]), num_keypoints)\n",
        "  y = np.linspace(min(input_df[_Y_VALUE]), max(input_df[_Y_VALUE]), num_keypoints)\n",
        "\n",
        "  x_grid, y_grid = np.meshgrid(x, y)\n",
        "\n",
        "  positions = np.vstack([x_grid.ravel(), y_grid.ravel()])\n",
        "  plot_df = pd.DataFrame(positions.T, columns=[_X_VALUE, _Y_VALUE])\n",
        "  plot_df[_GROUND_TRUTH_LABEL] = np.ones(len(plot_df))\n",
        "  predictions = _get_predicted_probabilities(\n",
        "      estimator=estimator, input_df=plot_df, get_input_fn=_get_input_fn_law)\n",
        "  return pd.DataFrame(\n",
        "      data=np.array(np.reshape(predictions, x_grid.shape)).flatten())"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "Bfrh3DUze0QN"
      ],
      "name": "Pandas DataFrame to Fairness Indicators Case Study",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
